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Abstract — This paper studies the problem of high-dimensional 
multiple testing and sparse recovery from the perspective of 
sequential analysis. In this setting, the probability of error is 
a function of the dimension of the problem. A simple sequential 
testing procedure is proposed. We derive necessary conditions for 
reliable recovery in the non-sequential setting and contrast them 
with sufficient conditions for reliable recovery using the proposed 
sequential testing procedure. Applications of the main results 
to several commonly encountered models show that sequential 
testing can be exponentially more sensitive to the difference 
between the null and alternative distributions (in terms of the 
dependence on dimension), implying that subtle cases can be 
much more reliably determined using sequential methods. 

I. Introduction 

High-dimensional testing and sparse recovery problems 
arise in a broad range of scientific and engineering appli- 
cations. The basic problem is summarized as follows. Let 
6 £ R" denote a parameter vector. The dimension n may be 
very large (thousands or millions or more), but 6 is sparse 
in the sense that most of its components are equal to a 
baseline/null value denoted by 6q (e.g., 9o = 0). The support of 
the sparse subset of components that deviate from the baseline 
is denoted by S: 



S = {i:6 l ^ 9 }. 
, £^ ■ The parameter is observed stochastically according to 



Vi ~ f(Vi\9i) 



(1) 



(2) 



where f(-\9) is a parametric family of densities indexed by a 
scalar parameter 9 £ ML The goal of the high-dimensional 
testing and sparse recovery problem is to identify S from 
observations of this form. 

The conventional theoretical treatment of this problem as- 
sumes that a set of observations are collected prior to data 
analysis. Typically, in what we refer to as the non-sequential 
setting, each of the n components is measured (one or more 
times) according to the model above and then component-wise 
tests are performed to estimate S. 

This papers investigates the high-dimensional testing prob- 
lem from the perspective of sequential analysis. In this setting, 
observations are gathered sequentially and adaptively, based on 
information gleaned from previous observations. This allows 
the observation process to focus sensing resources on certain 
components at the expense of ignoring others. For example, the 



process might first measure each component once, then focus 
on a reduced subset of 'interesting' components in a second 
pass. Such approaches have attracted attention lately due to 
importance in the biological sciences. They are also relevant 
in communications problems including spectrum sensing in 
cognitive radio, one of the motivations for our work. 

To compare sequential and non-sequential methods we 
impose a budget on the total number of observations that can 
be made. The main results show sequential methods can be 
dramatically more sensitive to small differences between the 
baseline/null 9q and the alternative values of 9i. Our approach 
is similar to the so-called distilled sensing method proposed 
in HI 12, however there are two main distinctions. First, 
the results in this paper are applicable to a large class of 
problems characterized by one-sided tests; the distilled sensing 
approach is specific to the Gaussian setting. Second, here we 
are concerned with the probability of error in identifying S; 
distilled sensing controls the false discovery and non-discovery 
rates which is less demanding than the error rate control. 

To give a sense of the main results, consider the case in 
which f(-\9) is a Gaussian with mean 9 and variance 1. If 
9q = and the alternative is 9i = 9\ > for i £ S, 
then reliable detection (probability of error tending to zero 
as 7i — > oo) is not possible using non-sequential methods if 
9\ < \/2 log n. In contrast, a sequential method that we will 
demonstrate is reliable as long as 9\ > y^A log |<S|, where |<S| 
is the cardinality of the support set. This shows the sequential 
method is more sensitive whenever |«S| < n 1 / 2 ; i.e., the sparse 
setting. The improvement is especially remarkable when 
is very sparse; e.g., if |<S| w logn, then sequential methods 
succeed as long as 6i is larger than a constant multiple of 
y/\og log n. The gains provided by the sequential method are 
even more pronounced for certain one-sided distributions. In 
spectrum sensing (where the measurements follow gamma 
distributions), to within constant factors, if the SNR grows 
as log(|6>| log n) then the sequential method is reliable, but 
any non-sequential procedure is unreliable if the SNR grows 
slower than y/n. To dramatize this result, if |«S| m logn, then 
the gap between these conditions is doubly exponential in n. 

II. Problem Statement 

For i = 1, ... , n let be a random variable distributed 
according to (fJJ. We say component i follows the null distri- 



bution if i £ S, where S is defined in ([T), thus yi ~ /(-|#o)- 
Conversely, we say that component i follows the alternative if 
i G S, and 9i 7^ 9q. Define s := \S\ as the level of sparsity, 
and assume that s <C n. Our goal is exact recovery of S. 

With some abuse of notation, let 9\ be a parameter cor- 
responding to an alternative distribution {not the parameter 
corresponding to component 1 of the vector 6). For simplicity 
we let 9i — 9 1 for i E S, that is, all components following the 
alternative follow the same distribution This allows 

a simple binary hypotheses to test for inclusion in S. More 
general consideration could test composite alternatives (in 
which the alternative distributions are a family of distributions 
with unknown 9i 7^ 9q) - in this setting, the results of this 
paper can be viewed as quantifying the minimum separation 
required between the null and any of the alternative distribu- 
tions. Extensions of this sort are obvious in the context of the 
applications considered in Section HVl 

To index multiple independent identically distributed obser- 
vations of yi, we introduce a second subscript j - yij is the 
jth observation of the ith component. Define the log-likelihood 
ratio statistic corresponding to index i as 



T, 



:=E lo § 



The distribution of the log-likelihood ratio depends on the 
number of independent observations, indicated by the subscript 
to. As both sequential and non-sequential tests compare Tj im 
to a threshold, we refer to Ti, m as the test statistic. 

A. Measurement Budget 

To compare sequential and non-sequential methods, we 
impose a budget on the total number of measurements. A 
single measurement consists of observing, for example, 
and thus, observing (j/x 1, t/i „) requires n measurements. 
The total number of measurements is limited to JV < 2mn, 
where to > 1 is an integer. 

B. Non-Sequential Testing 

The non-sequential approach distributes the measurement 
budget uniformly over the n components, making 2to i.i.d. 
observations of each. Let Jft 1, . . . , J/i,2m denote the 2to obser- 
vations of component i, and let T^m denote the corresponding 
test statistic. The test takes the form 



Ti 



i,2m 



(3) 



and, for some r, is optimal in terms of probability of error 
among all (non-sequential) component-wise estimators. The 
estimated support set at threshold t is 



S T := {i 



>r} 



C. Sequential Thresholding 

The sequential method we propose is based on the follow- 
ing simple bisection idea. Instead of aiming to identify the 
components in S, at each step of the sequential procedure we 
aim to eliminate about 1/2 of the remaining components not 



in S from further consideration. The components that remain 
under consideration after K such steps is our estimate of the 
set S. 

Suppose we begin by using half of our measurement budget 
to collect to observations of each component. The test statistic 
for each is Tj >m , a function of (y^i, . . . , yi, m ). Assume #0 
is known and let Ti >m \9o denote the random variable whose 
distribution is that of the test statistic under the null. Consider 
the threshold test 

7i >m > median(r^„ l |6'o) • 

For i £ S, the test statistic Ti. m falls below median(Tj )m |#o) 
with probability 1/2. The threshold test above thus eliminates 
approximately 1/2 of the components that follow the null. We 
can next use a portion of our remaining budget of mn to repeat 
the same measurement and thresholding procedure on the 
remaining components. Since approximately n/2 components 
remain this will require mn/2 of the remaining budget. 
Repeating this process for sufficiently many iterations will 
remove, with high probability, all of the null components. We 
call this process sequential thresholding and give a formal 
algorithm below. The output of the procedure, Sk, is the 
estimated support set. Notice that sequential thresholding does 
not require prior knowledge of the size of the support set. 

Sequential Thresholding 

input: K > steps, 70 := median(Tj m |#o) 
initialize: So = {1, n} 
for k = 1, . . . , K do 
for i e Sk-i do 

s ( k)xm j nr=i/(yi>o) its 

measure: {y, /\T--\ ~ \ cti 

threshold: S k := {i e S fc -i : > 7o} 
end for 
end for 

output: Sk 



While sequential thresholding is described and analyzed 
using a threshold at the median of the null, in practice, other 
thresholds can be used (for example, a threshold at the 95 
percentile) and can result in improved performance. 

D. Sequential Thresholding Satisfies Budget 

The number of measurements used by sequential threshold- 
ing satisfies the overall measurement budget N < 2mn in 
expectation. The expected number of measurements is 



E 



K-l 
,fc=0 



< 



< 



m(n — s) 



K-l 

E 

2m(n — s) + msK 



since, in expectation, we eliminate half of the remaining 
null components (and perhaps some following the alternative, 
hence the first inequality) on each of the K passes. Our interest 
is in high-dimensional limits of n and s. Suppose that sK 



grows sublinearly with n. Then for any e > there exists an 



N f such that E 



< 2(l + e)mn for every n > N e . 
We suppress the factor 1+easwe proceed as it does not effect 
our results. 

E. Implementations 

There are two possible implementations of sequential 
thresholding which we refer to as parallel and scanning. 

parallel: The parallel implementation measures and tests all 
n components in parallel according to the procedure. 

scanning: The scanning implementation measures and tests 
the n components in a sequence (which can be arbitrary). 
For example, the scanning implementation can begin with 
component i = 1 and repeatedly measure and threshold the 
observations up to K times. If an observation falls below the 
threshold at any point, then the scanning procedure immedi- 
ately moves on to the next component. If K observations are 
made without an observation falling below the threshold, then 
the component is added to the set Sk- The expected number 
of observations obeys the same bound as derived above. 

The two implementations are equivalent from a theoretical 
perspective. The parallel implementation may be more natural 
for large-scale experimental designs (e.g., in the biological 
sciences), whereas the scanning implementation is more ap- 
propriate in communications applications such as spectrum 
sensing. The latter also reveals natural connections between 
sequential thresholding and sequential probability ratio tests. 

F. Connection to Sequential Probability Ratio Tests 

As we will show in the following section, in the high- 
dimensional limit (n — > oo) sequential thresholding can drive 
the probability of error to zero if the divergence between 
the null and alternative distributions is log |£>| times a small 
constant. This specializes in the Gaussian setting to the re- 
quirement that the difference between the means is at least 
y '4 log \S\, which compares favorably to the requirement that 
the difference exceeds \/2 log n for non-sequential methods. 

In fact, the log |£>| dependence of sequential thresholding 
is optimal, up to constant factors. This follows from well- 
known results in sequential testing. Let S denote the result 
of any testing procedure based on n local (component-wise) 
tests of the form Hq: i £ S against Hi : i € S. Each test is 
based on the sequential observations y^p, ■ ■ ■ , y%,N, and 
the stopping time of the test is the value of N (possibly 
random) when the decision is made. 

Suppose that each individual test has false-positive and 
false-negative error probabilities less than a := £/(n — \S\) 
and (3 :— e/\S\, respectively. Then the expected total number 
of errors is E^nS ! +E|5 C H S\ < 2e. It is necessary that 
this expected number tend to zero in order for the probability 
of error, F(S ^ S), to tend zero. With the above specifications 
for the two types of error, it is possible to design a sequential 
probability ratio test (SPRT) for each component. 

The SPRT computes a sequence of likelihood ratios, where 
li, n is the likelihood ratio of y^i, . . . , yi, n , n > 1. The SPRT 



terminates when > B or < A, where the thresholds 
A and B are determined by the equations a = _B _1 (1 — /3) 
and (3 — A(l — a) (see [3| p. 11). Note that, unlike sequential 
thresholding, the SPRT requires knowlege of both distributions 
as well as the level of sparsity. Since such information is 
usually unavailable in applications, we advocate the use of 
sequential thresholding instead; it requires only crude knowl- 
edge of the null and nothing about the alternative or sparsity 
level. 

From the Wald equation, the expected stopping time of the 
SPRT per index is (approximately) (|3] 



E [N'] Si 
EiL/V'l = 



Mo 1 



a log 



1-/3 



(1 - a) log 



(1-/3) log 



1-/3 



/Slog 



1 -a 

/3 



1 



where E, denotes the expectation under and fii 



)nr f(y\0i) 
iog 7(^) 



(3 = e/|<S|, and as e - 
Eq[JV' 



0, 1. In our case a = 
■> we have 

e 

n-\S 



e/(n - \S\) and 



/'o 



Moe 



Ei [N'] ^ A^'log- 



If | S | <C n, then the expected total number of measurements 
of made by all n SPRTs is 



E[N] = (n- \S\)MN'] + \S\Ei[N'] 



n e 

— log 7^7 

Mo \S\ 



Note that Mo = --Do := -£>(/(^o)||/(-|^)), the KL 
divergence of f(-\0i) from f(-\9o), so expected total number 
of observations made by the n SPRTs is 

logM. 

It follows from the optimality of the SPRT that no other 
component-wise testing procedure with e error-rate requires 
fewer observations. Now let us constrain this expected total to 
be less than or equal to 2mn. This yields a necessary condition 
for controlling the probability of error of any sequential test: 



£>(/(• 



> 



?o)||/(-|0i)) 
III. Main Results 
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The main results rely on the extremal properties of the 
test statistic. We say that a testing procedure is reliable if it 
drives the probability of error to zero in the high-dimensional 
limit. More formally, consider a sequence of multiple testing 
problems indexed by dimension n. Let S(n) denote the true 
support set and let S(n) — S T (non-sequential procedure with 
threshold r) or S(n) = Sk (sequential procedure with K 
passes). We define a notion of reliability as follows. 

Definition III.l. (Reliability) Let £ denote the error event 
{S(n) ^ S(n)}. We say that the support set estimator S(n) 
is reliable iflmin^oc P(£) = 0; conversely, an estimator S(n) 
is unreliable ;/ linin^oo P(£ ) > 0. 



To simplify notation we will not explicitly indicate the 
dependence of the statistics on n. We show that the non- 
sequential testing procedure in (0 is unreliable at every 
threshold level r if 



lim 



>1 =1 



(4) 



>oo \ v mediari(r,; ! 2m|6'i 
Conversely, sequential testing according to sequential thresh 
olding is reliable if 



lim ] 

n— >oo 



mm fcLi minj g 5 



T. 



median (T iim |0 o ) 



< 1 



, 



(5) 



and K = (1 + e) log 2 n, for any e > 0. We are interested 
in ranges of 9\ > 9a that satisfy the conditions above. In 
many cases of interest, (0 and hold simultaneously for 
a wide range parameter values. This implies that there are 
many regimes in which sequential methods are reliable, but 
non-sequential methods are not. 

For example, we show in Section [TV- Al that if the underlying 
component distributions are unit variance Gaussian with means 
6>o = and 9% > 0, then the non-sequential procedure (0 
is unreliable if 6% < */ — log n whereas sequential thresh- 



olding is reliable if 6\ > y ^ log(s log 2 n). The size of the 
sparse support, s, is typically much smaller than the overall 
dimension n, and so there are many cases in which the 
sequential method is reliable but the non-sequential method 
is unreliable. The gap between the two conditions can be 
exponentially large in terms of the dimension n. As a specific 
example, if s = log n, then the sequential method is reliable if 
9\ > 2v / loglogn and the non-sequential method is unreliable 
if 9 1 < V21ogn. 

A. Limitation of Non-Sequential Testing 

Theorem III.2. 7f (0 holds, then the non-sequential procedure 
in (0 is unreliable. Specifically, if £ T is the error event {S T ^ 
S}, then for every r 

limP(£ T ) > i. 

n— ¥oo Z 

Proof: The non-sequential testing procedure accepts the 
null hypothesis if the test statistic T^m is less than some 
threshold, r, and conversely, rejects the null hypothesis if 
Ti.2m > t. The probability of error at threshold level r is 

F(£ T ) = p(U{ T ^ m >r}|J{r^ m <r} 

and the minimum probability of error is min T P(£ T ). Now 
suppose we take r = median^^ml^i), the median value 
of the test statistic under the alternative. At this threshold 
level, the false-negative rate would be 1/2, and so the overall 
probability of error would be at least 1/2. It follows that the 
minimum probability of error can be bounded from below by 

minP(£ T ) > min(l/2, P(U^5{^.2,„ > median(T l . 2m |0 1 )}) 

T 

According to the second argument above tends to 1 as 
n-Kxi, which completes the proof. ■ 



B. Capability of Sequential Thresholding 

Theorem III.3. If (0 holds, then sequential thresholding is 
reliable if K = (1 + e) log 2 n, for e > 0. Specifically, if £ e is 
the error event {Sk ^ S}, then for any e > 

lim P(£ e ) = 0. 
Proof: The probability of error is 

¥(S e ) := P(S K ^S) 

= p({sns c K ^®}u{s c ns K ^®}) 

< p(sns c K ^<&) + p(s c ns K ^0) , (6) 

where the superscript c denotes the complementation of the set. 
The upper bound on the probability of error consists of two 
terms, the false-negative and false-positive probabilities. The 
false positive probability (second term in (0) can be bounded 
as follows: 



K 



|J n {^™> median (T vm |0 o )} 

yigS k=l 

< J2 ( P { T ul ^ median ( T v- 1 9 ° ) 
n - \S\ 



K 



2 K 

where the last step follows since the probability a random 
variable exceeds its median is 1/2. Since K = (1 + e) log 2 n, 
with e > 0, we have 

lim P {s c n s K ^ 0) = o . 

n—>oo 

Bounding the false-negative probability (first term in (0) 
depends on the distribution of the test statistic under the 
alternative 9\\ 



uu{ 



ifm < median 



(T i;m |0 o )} 



\k=l ieS 

minminT-^ < median (Ti im \0o) 



k=l ieS 



which, from 0, goes to zero in the limit, completing the 
proof. ■ 

IV. Applications 

To illustrate the main results we consider three canonical 
settings arising in high-dimensional multiple testing. We again 
have in mind a sequence of problems and consider behavior 
in the high-dimensional limit. Thus, when we write 9 < g(n) 
(or 9 > g(n)) we mean that the parameter 9 may (must) grow 
with dimension n no faster (slower) than the function g(n). 



A. Gaussian Model 

Gaussian noise models are commonly assumed in multi- 
ple testing problems arising in the biological sciences (e.g., 
testing which of many genes or proteins are involved in a 
certain process or function). For example, a multistage testing 
procedure similar in spirit to sequential thresholding was 
used to determine genes important for virus replication in 
2). Consider a high-dimensional hypothesis test in additive 
Gaussian noise where the parameter represents the mean of 
the distribution. We assume the null hypothesis follows zero 
mean (6>o = 0), unit variance gaussian statistics; the alternative 
hypothesis, mean 9\ > 0, unit variance: 

w ~ \ jV(9 u I), ,e5. 

1 ) Non-Sequential Testing: We make 2m measurements of 
each component of 0. The test statistic again follows a normal 
distribution: 



^ 2m 



M(6i,,k), ies. 



(7) 



Corollary IV.l. If 9\ < y los ^ T " ! '\ then the non-sequential 
testing procedure in $3} is unreliable, i.e., min r P(£ r ) > 1/2. 

Proof: For the test statistic in equation (0, we satisfy (|4]i 



provided median (T^m |#i) < 



lQg(Tl-s) 



(see, for example 



[5 1). By Theorem MI. 21 and since median (Ti^ m \9i) = 6\, if 



< 



log(rt — s) 



m 



then non-sequential thresholding is unreliable. ■ 
2) Sequential Testing: Sequential thresholding makes m 
measurements of each component in the set Sk at each step. 
The test statistic follows a normal distribution: 



( fe ) _ J_y^ . 

3 = 1 



1 



Corollary IV.2. If 9 1 > 

thresholding is reliable. 




"fax 



i e S. 



(8) 



log(s log 2 n), then sequential 



Proof: In this case, equation (f5]) is satisfied provided 
median(T iim |0 o ) < 6%-^^^ (see for example 0). Since 
median (Tj )m |#o) = 0, Theorem IIII.3I tells us that provided 



h > \/ -log As 

m 



with K = (1 + e) log 2 n, we reliably recover S. 



B. Gamma Model: Spectrum Sensing 

Often termed hole detection, the objective of spectrum 
sensing is to identify unoccupied communication bands in the 
electromagnetic spectrum. Most of the bands will be occupied 
by primary users, but these users may come and go, leaving 
certain bands momentarily open and available for secondary 
users. Recent work in spectrum sensing has given considerable 
attention to such scenarios, including some work employing 
adaptive sensing methods (see, for example (|6], Q). 

Following the notation throughout this paper, channel oc- 
cupation is parameterized by 0, with 9q denoting the signal 
plus noise power in the occupied bands, and 9x representing 
the noise only power in the un-occupied bands. Without loss 
of generality, we let 9\ = 1. The statistics follow a complex 
Gaussian distribution - y\ ~ CAf(0, 9). From [8], making m 
measurements of each index, the likelihood ratio test statistic 
follows a Gamma distribution: 

I Gamma (m,9o) i (jL S 



T (k) = \p 



Vi,j\ 



Gamma (m, 1) i e 5. 



GO) 



Remarkably, the sequential testing procedure is reliable, to 
within constant factors, if #o grows as log(s log 2 n), but the 
non-sequential testing procedure is unreliable if 9q grows as 
(n — s) 55T . This implies, if s = log n, then the gap between 
these conditions is doubly exponential in n. 

Since we are interested in detecting the sparse set of 
vacancies in the spectrum, our hypothesis test is reversed. We 
reject the null hypothesis (occupied component) if the test 
statistic falls below (rather than above) a certain threshold. 
In this case, the inequalities in the key conditions (0]) and 
(|5]l are reversed: specifically, the non-sequential thresholding 
procedure is unreliable if 



lim 



'III!- 



< 1 



n^-oo \medi&n(T it2m \9i j 
and sequential thresholding is reliable if 



1 



lim ] 

n— >oc 



'maxf^max^T, 
median (T hm \9 



(AO 



> 1 = 



(11) 



(12) 



1 ) Non-Sequential Testing: In the non-sequential procedure 
d3j, we make 2m measurements per index. The distribution 
of the test statistic follows a gamma distribution with shape 
parameter 2m. 

Corollary IV.3. If 6 < 2(m — l)(n — s)^, then the non- 
sequential procedure in (O is unreliable. 

Proof: In this case, because the hypothesis test is reversed, 
we aim to satisfy ( fTTl i. Since median(Ti > 2 m |^i) > 2(m — 1), 
we have 

m ( minjgs Ti^m < A > „ ( mirijgs r, j2m 



\ median (T ii2m |^i) 



2(m- 1) 



< 1 



(9) If 2(m - 1) > 



(n— s) 2n 



the right hand side above goes to 1 



as n grows large (see Appendix [A}. Together with Theorem 
IIII.2I this implies that if 9 < 2(m - l)(n - s)^ then the 
non-sequential procedure is unreliable. ■ 



2) Sequential Testing: Sequential thresholding makes m 
measurements of each component in the set Sk at each step. 
The test statistic follows the Gamma distributions in ([Toi l. 

Corollary IV.4. If 9 > log( ^ S2 " } , then sequential thresh- 
olding is reliable. 

Proof: It suffices to show ( TTSl i is satisfied. For all m and 
8q, we have median(Ti im |(9o) > 9o(m — 1). We upper bound 
CLl by 



n-Hx I pq(™ — 1) 



(fc) 



> i 



which goes to zero in the limit provided do(m— 1) > logics 
(see Appendix iBli. Together with Theorem IIII.3I if 

logics 



% > 



(13) 



with if 
reliable. 



m — 1 

(1 + e) log 2 n, then sequential thresholding is 
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C. Poisson Model: Photon-based Detection 

Lastly we consider a situation in which the component 
distributions are Poisson. This model arises naturally in testing 
problems involving photon counting (e.g., optical communica- 
tions or biological applications using fluorescent markers). We 
let the (sparse) alternative follow a Poisson with fixed rate 9\, 
and the null hypothesis a rate 9q, 9q > 6\. 



Poisson((?o) i £ S 
Poisson(6>i) i E S 



Note that as 9q > 9\, our hypothesis test is reversed as in the 
spectrum sensing example (and equations (flTT l and (TT2l >). 

The test statistic is a sum of the individual measurements, 
again following a Poisson distribution. In this setting, the gap 
between sequential and non-sequential testing is similar to that 
of the Gaussian case. Proofs are left to Appendices Icl and |Pl 

Corollary IV.5. For any fixed B x , if 9 Q < log{ 2 ^ s) , non- 
sequential thresholding is unreliable. 

Corollary IV.6. For any fixed 9 h if 9 > '° s(s 

sequential thresholding is reliable. 

V. Conclusion 

This paper studied the problem of high-dimensional testing 
and sparse recovery from the perspective of sequential analy- 
sis. The gap between the null parameter 9 and the alternative 
9i plays a crucial role in this problem. We derived necessary 
conditions for reliable recovery in the non-sequential setting 
and contrasted them with sufficient conditions for reliable 
recovery using the proposed sequential testing procedure. Ap- 
plications of the main results to several commonly encountered 
models show that sequential testing can be exponentially (in 
dimension ri) more sensitive to the difference between the null 
and alternative distributions, implying that subtle cases can be 
much more reliably determined using sequential methods. 



Appendix 

A. Gamma Non-Sequential 

The cumulative distribution function of Gamma(2m, 9o) is 
given as 

2m- 1 

F(7) = l-e - # 



If 2m9o < log(n — s), then 



lim 1 — (1 — e 



-2m8 Q 



£=0 



which is also true provided #o < 
proof. 



and concludes the 



hence, 



min,rf.s Ti 



7 



2m-l / \ £ -, 

7 \ 1 



1=0 



Letting 7 = a — r - and taking the limit, it can be shown 

(n-s)55 



-((«-« 



! . 2m — 1 , s. 

)-3sr) (n- s)- 



lim 1 — l ( 

n— >oo 



If 7 > — ^-t-, then 



D. Poisson Sequential 

In sequential thresholding, for each i £ Sk 

T (k) _ iid { Poisson(m6» ) i g S 

l ' m ~ f^i ~ I Poisson(m0!) i € 5. 
We need to show, for the test statistic above, 

lim P ^ — ,J ,„ , ' > 1 = 0. 



1=0 



1 - e TssrjT. 



min^g 7^2ro < A _ , 



(n-s)t 



7 

B. Gamma Sequential 

The cumulative distribution function of Gamma(m, 1) is 
given as 



m — 1 



F( 7 ) = 1 - e-r £ 5 



£=0 



hence, 

P fe™ a 5 X ^™ > 7) = 1 - ( 1 - e" 7 E 



m— 1 



A'.s 



T 



Letting 7 = (1 + e) log Ks, for some e > 0, we have 

((l + e)log^\^ 



median(T m |6'o) 
First, we note median (T m \6o) > m0Q — 1. Hence, 

P ( maxmaxTi :m > median (T m |(?o^ 

< P ( max max Ta m > 77i#o — 1 

V fc=l iG5 

We can bound the probability of a single event by Chernoff's 
bound [9], p. 166. For Tj )TO ~ Possion(m#i) we have: 

P(T !im > 7 ) < e - mei - 7 ( log (*)" 1 ) 
which implies 

P |maxmaxT,, m > 7) < 1 - (l - e^M^)" 1 ) V 

Letting 7 = log Ks and taking the limit as n — > 00 of the 
expression above for any fixed 9\, we conclude 



lim 1- 



/ 1 m-1 



lim 1 

n— »-oc 



. 



maxjlj max ieS T» im 



C. Poisson Non-Sequential 

The likelihood ratio statistic is distributed as 

2 m 



logics 

Thus, if logics < mdo — 1, or equivalently 
logics + 1 



> 1 = 



> 



Ti.lm — ^ Vi,j 
3=1 

It suffices to show 



iid J Poisson(2m6> ) i & S 
[ Poisson(2m6>i) i € 5. 



sequential thresholding is reliable. 



lim p ( min ^f r '- 2 ;" < l") = 1. 

n^oo \ v median(r 2m |yi) / 

for any 6q < log ^~^ . The bound we derive is loose, but 
sufficient to show the adaptive scheme is superior. First, we 
assume that median(T2„ l |6'i) > 0. Next we have 

P wxaT ii2m < median(T 2m |0i) 

\i£S 

> P (min^s T»,2m = 0) 



